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Abstract 

Quantitative methods for studying biodiversity have been traditionally rooted in the classical theory of finite 
frequency tables analysis. However, with the help of modern experimental tools, like high throughput sequencing, 
we now begin to unlock the outstanding diversity of genomic data in plants and animals reflective of the long 
evolutionary history of our planet. This molecular data often defies the classical frequency/contingency tables 
assumptions and seems to require sparse tables with very large number of categories and highly unbalanced cell 
counts, e.g., following heavy tailed distributions (for instance, power laws). Motivated by the molecular diversity 
studies, we propose here a frequency-based framework for biodiversity analysis in the asymptotic regime where 
the number of categories grows with sample size (an infinite contingency table). Our approach is rooted in 
information theory and based on the Gaussian limit results for the effective number of species (the Hill numbers) 
and the empirical Renyi entropy and divergence. We argue that when applied to molecular biodiversity analysis 
our methods can properly account for the complicated data frequency patterns on one hand and the practical 
sample size limitations on the other. We illustrate this principle with two specific RNA sequencing examples: a 
comparative study of T-cell receptor populations and a validation of some preselected molecular hepatocellular 
carcinoma (HCC) markers. 
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1 Introduction 


Developing effective methods for quantifying and comparing empirical diversity of various biological populations 
is one of the fundamental problems of modern life sciences as it has direct impact on our understanding of the basic 
operating principles of our planet’s ecosystem and its evolution (cf., eg.,[Berkov et ah 2014 1 . In the course of its 3.5 
billion years of evolutionary history, nature has developed an outstanding bio- and molecular diversity among the 
Earth's species of plants and animals. Indeed, it is estimated that there are currently about 8.7 million eukaryotic 
species on earth, both marine and terrestrial, 88% of which are still waiting to be described (Mora et al, |2011[ ). 
The diversity at the molecular level is perhaps even more spectacular, as it occurs at different levels of biological 
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organization: within one individual (e.g., through RNA, DNA, proteins, and metabolites), between individuals of 
the same and related species, within and between species and ecosystems, as well as throughout evolution (see, 
e.g., Campbell 2003). For instance, the number of different molecular types of human T-cells is estimated at 10 18 


(Janeway, 2005j ) which only slightly less that the currently estimated number of stellar objects in the known universe 


(the latter believed to be of the order 10~ 1 ). 

Whereas the power of modern computing has allowed us to make steady progress towards building ever more 
robust empirical measures of biodiversity based on a variety of considerations (see, e.g., Presley et al, 2014 1 , the most 
relevant to our discussion here are the measures borrowed from the field of information theory. They include among 
others the Hill number (or the effective number of species) and the related concept of the Renyi entropy (see, e.g., 
the recent review Chiu et al (2014) and references therein). Although originally proposed for quantifying ecological 
diversity in the macro-scale ecosystems ( Chao et al| 20101, the use of the empirical Renyi entropy as a descriptor 
of diversity was also adopted for molecular populations in de Andrade and Wang ( 2011j ). Since then the Renyi-type 
measures were applied to problems of molecular populations ranging from analyzing regulatory variants and testing 
genome-wide associations (jSun and Hu 2013} Sadee et al} 2014 1 to comparing different T-cell populations (Cebula 


et al, 2013; Rempala and Seweryn| 2013|). Despite their growing usage in biodiversity studies of both macro- and 


molecular- level populations, it appears that some important statistical properties of the Renyi-type measures have 
not been yet sufficiently understood, especially in the context of frequency-based analysis and large sample behavior. 

Currently, standard methods of obtaining molecular level data on the transcriptome (RNA) abundance rely on 
the so-called next-generation sequencing (NGS) technology and especially the high-throughput RNA sequencing or 
RNA-seq ( |Wang et al 20091. Flowever, the molecular count data from NGS often elude standard statistical analysis 
due to the fact that exhaustive sampling of the DNA and RNA fragments for the puipose of sequence reconstruction 


is not feasible and that the sequencing errors increase with sampling intensity or sequencing depth (O’Rawe et al 


20151. It has been therefore generally conceded (jOh et al[ 2014 1 that the standard, fixed-dimension, non-parametric 


frequency/contingency table analysis (see, e.g., Agresti 2002) does not readily apply to the NGS data and that a 
different, infinite-size contingency table framework, more reflective of the current sequencing technology, appears 
necessary. Due to the nature of the NGS methods, such framework should be based on the large sample (high- 
throughput) considerations but, at the same time, should also account for the increase in the number of sequencing 
errors with increasing sample size as well as for the under-sampling bias. 

Motivated by the questions on comparing biodiversity in molecular data (especially arriving from the NGS exper¬ 
iments) in the current paper we establish some large sample results for the empirical Renyi entropy and divergence 
in order to bridge the gap between current heuristic approaches and a more formal statistical theory of large samples. 
To this end, we derive herein several central limit theorems (CLTs) which yield approximate confidence bounds for 
the (Renyi) entropy-based measures of diversity and similarity in the setting of an infinite contingency table. Our 
CLT results complement both the law of large number theorems in Rempala and Seweryn (2013) as well as the 
CLT for the plugin estimates of the Shannon entropy Zhang and Zhang ( j2012 1 and the Kullback-Leibler divergence 
estimates ( |Paninski] 2003} Zhang and Grabchak 20141. Since in the NGS experiments one typically expects to 
under-sample the transcriptome, we focus here on the Renyi entropy exponent (which below is denoted by a) less 
than one, so as to up-weight the contributions of the lower counts and our CLT results are restricted to this case. 
The extensions to arbitrary exponents are straightforward but not considered here. In order to provide examples 
of the types of applications motivating the mathematical results, we analyze two real biological datasets from two 
different types of NGS experiments. In the first experiment, described in the study Cebula et~al] (2013), one compares 
multiple T-cell receptors populations taken from mice before and after treatment with antibiotics. The goal of the 
second experiment is the elucidation of differences in gene expression profiles between cancer and control tissues 
in individuals with hepatocellular carcinoma, as described in Chan et al (2014). In both presented examples the 
NGS datasets are analyzed and de-noised by applying a multi-stage process developed on the basis of our theoretical 
results. 

As already indicated above, the problem of empirically estimating entropy and divergence has been extensively 
studied in the statistical and machine learning literature over past several decades, both in the context of discrete 
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could be expected to hold in the discrete case as well. The main difference between these types of results and 
what is considered here is that in our setting the discrete density function / is allowed to change as the sample size n 
increases. Additionally, although in the current and that we only analyze the basic empirical frequency (the so-called 
plug-in) estimates. 

The paper is organized as follows. In the next section (Section 2) we outline the relevant mathematical concepts 
along with the necessary notation. In Section 3 we state the main theoretical results of the paper, namely the CLTs 
for the Hill number (or the Tsallis entropy) and the Renyi entropy and divergence in the asymptotic regime when 
the diversity of the population (i.e., the number of different types) grows with the sample size. The results for the 
simpler case (Theorems 1 and 2) when Renyi entropy statistics admit linear approximations are established via the 
intermediate CLT results for the corresponding power sums which are closely related to the CLTs for Hill’s numbers 
and Tsallis’ entropies. These results are also included as parts of formulations of Theorems 1 and 2. In case of 
the uniform distribution for the Renyi entropy as well as the equal-marginals bivariate distribution for the Renyi 
divergence, the power sum CLTs are no longer valid (there is no linear approximation available) and other methods 
are required to establish weak convergence to Gaussian variates under slightly more stringent conditions. These 
results are presented as Theorems 3 and 4 in Section 3. As it turns out, the key ingredient needed to establish 
Theorems 3 and 4 is the CLT result for two Pearson-type chi-square statistics in an infinite contingency table. This 
latter result is of interest in itself and is presented as Lemma 2 in Section 3. In the following Section 4, we provide 
some simulation-based examples of the asymptotic behavior of estimates from Section 3 in the case (relevant for our 
applications) of power law distributions under various sampling scenarios. These examples illustrate in particular 
how the CLTs of Section 3 may hold or not, depending on the relations between the dimensions of the relevant 
contingency tables and the empirical sample sizes. In the second part of Section 4 we also discuss in detail the two 
biological examples of NGS data analysis and show how the results of Section 3 may be used to analyze biodiversity 
of T-cell receptors and to profile the multiple sets of transcriptomes. The final Section 5 offers a summary and brief 
conclusions. The proofs of all more complicated results are provided in the appendix along with some auxiliary 
technical lemmas. 

2 Power Sums, Entropy and Divergence 

Consider a triangular array of bivariate row-wise independent random variables Z n j c for k 
row arc equidistributed with the random variable Z„ = (X n . Y n ) such that P(X„ = i. Y n = j) - 
Below we suppress the index n when possible, writing e.g., m.Z^.Z.pij, etc. for simplicity 
Let a > 0 and for any probability distribution p = ( pt)™ =l define 

m 

y a (p) = Y J pf. (2.1) 

i=l 

Similarly, for any pair of distributions p = , and q = {qi)™ =x define 

m 

^a{p,q) = Y,P?q]~ a - (2.2) 

i= 1 

(Note that SZ'x = 1). The well-known special case of the above is a = 1/2, which results in a symmetric index 
Z/\ /lip-q) = -Z^i/iiq-P) often referred to as the Bhattacharyya coefficient (see, e.g., 


Nielsen and Boltz 


2011 


= 1,..., n which in each 
= Pij for ij = 1 
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Recall (Renyi 1961) that for a given distribution p its Renyi entropy SF a is defined as 

•*%x(p) = Y^iog (X>?) = y^ 1 o §^«(p) 
and that for a pair of distributions [p,q) their Renyi divergence S a is defined as 

,, , 1 


W,q) = 


a - 1 


log 0 , 4 ). 


Note that the sign change in the normalizing constant is needed in order to ensure non-negativity of ,'/F a and S> a . 
The special case of with a = 1/2 is referred to as the Bhattacharyya distance, and may be expressed in terms of 
the Mahalanobis distance (see, e.g.,[Nielsen and Bolt/ 2011), whereas the linear approximation of JfP a (p) given by 


np) = 


1 - a 


(^a(p)-l)- 


(2.3) 


is sometimes referred to as the Tsallis entropy and has important applications in the field of statistical mechanics 
( Tsallis| 1988). Note that for our current purposes, we will only consider the quantities @> a ,J4? a , and S' for a 
satisfying 0 < a < 1 . 

In what follows the summation symbol without subscripts (£) will indicate summation with respect to the index 
/'(/'= I,..., lit) whereas p = and q = (qi)’}Li will (typically) denote the marginal distributions of the bivariate 

variable Z = (Z,T) whose distribution is denoted by (p;/)'" / _| ■ Additionally, the uniform distribution on m points 
will be denoted by u. An important relation between the Renyi entropy and the Renyi divergence is 


j%x(p) = \ogm-9 a {p,u). 


(2.4) 


We note also the following monotonicity property of S a and M n with respect to the index a. 

Lemma 1. For 0 < a < [i < \ we have 3> a (p,q) < S>p(p,q) and Thus, in view of ( |2.4[ ), also S a (p) > J4fp(p). 

a—l 

Proof. Note that for x > 0 the function x —> x?>-' is strictly convex for 0 < a < j 8 < 1. Therefore, by Jensen’s 
inequality 






1 . ^ 


-jrrf l0 *T>U 


= %(/»,?)■ 


□ 


Example 2.1 (Hill’s Number). For given 0 < a < 1 the measure of diversity of a distribution p also known as the 
effective number of classes may be defined as (see, e.g., Jost 2007 [ Chao et al| 2012| Rempala a nd Seweryn[ 2013 1 
ENC a {p) = exp (J%z(p)) = S’alp) 1 ^ 1 a) . It follows then from Lemma 1 that for any 0 < a < Jl < 1 we have 
ENCa(p) > ENCp(p). (As it turns out, this inequality may be in fact extended to arbitrary positive a < f ). 


2.1 Low Diversity Condition and Projection Variables 

The notion of an infinite-dimension contingency table brought up in the introduction may be now formally introduced 
simply as a requirement that for n -size sample from (p,■/)'')_ | we have m -» °° as 11 —> 00 . Throughout the paper, let a A 
b denote min(a, b) for any real a,b and let a n ~ b n (resp. a n ~ 0(b n )) denote a n /b n —> 1 (resp. A < limsup ;i a n /b n < B 
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for some finite A, B) as n —t oo for any real sequences a n . b„. Throughout the paper we consider only the low diversity 
(LD) schemes in which the marginals p,q, of Z satisfy the following LD condition. 

(np*)~ l = o(n~ z ) for some t > 0, (2.5) 


where p, = min ; (p,) Amin,(< 7 ,). Note that since /;* < 1/m then ( |2.5[ ) implies in particular m/n = o(n T ). As it turns 
out, for many distributions p the two conditions are in fact equivalent, as seen in the following. 

Example 2.2 (Power Law Model). Let p = q and assume that p ,■ = H~ 1 ( j3,m)/(t^/(t)), (i = 1 where 

ft > 0, / (x) is a non-decreasing slowly varying function (see, e.g., Soulier 2009, chapter 1), and H Ujj.m) = 
1 /!”, (i^l(i)) 1 is the normalizing constant. Note that if 0 < j8 < 1 then H '(fi.m) ~ (1 — p)l(m)/m l P and ( |2.5[ ) 
is implied by m/n = o{n~ z ) since 

, .1 . „._i mrl(rn) , „._i/n 

(nmmpi) *~(l-/3) — = (1-/3) 

1 nmP v l{m) n 

For any 0 < a < 1 and a given pair (, m,n ), let us define two random variables which will play an important role 

((X) 

in the following section. Let W,\ be defined as 


P(W,l a) = ap a ~ l ) = Pi 


( 2 . 6 ) 


fez) 

for / = 1,... ,m. Similarly, define also V n as 


P(vi“)=a(li) + (1 - a) ( — 


di 


Pi 


1 -a 


Pj 


4j 


= Pij 


(2.7) 


for /. / = I.... ,m. In the following, for the reasons discussed below, we refer to ( |2.6[ ) and ( |2.7[ ) as the projection 
variables or simply projections. 


Remark 2.1. Note that 

EW,[ a) = a^a(p) 

f cx ) 

and VarW/i — 0 iff /;, = 1 /m for all /, that is, p = (/;,) = w is a uniform distribution on m support points (this case 
is often referred to as a maximal diversity model or a pure noise model). Similarly, 

EV,^ = y a (p,q) 

fez) 

and it is also easy to see that VarV,, = 0 iff /;,■ = q, for all i, that is, p = q. 

As it turns out, both cases p = u and p = q require special consideration in the asymptotic analysis of ■-/d a and 
3> a . In view of the remark above they may be referred to as the cases of “degenerate” (zero variance) projections. 


Example 2.3 (Noise-and-Signal and Pure Noise Models). A distribution concentrated on m +1 support points, such 
that po > 0 and p, = (1 — po)/m for 1 < i < m, may be considered as a simple model of signal contamination. Note 
that in this case we have P(wj°^ = ap“ _1 ) = po, P(W,[ a * = am 1_ “(l — Po) a ~ l ) = 1 — Po and 


Varw!i a) = a 2 


m 


l-a 


(l-Po) c 


po \ 
1 -po) 


1/2 


n a 

Po 


1 - po 

PO 


1/2N 


For the pure noise model po = 0, in which case the support reduces to m points, and the above formula is not valid. 

f cz) 

Flowever, as already pointed out before, in this case we may show directly that VarW,, = 0. 


5 












3 Limit Theorems 


Let N(0, 1) denote the standard Gaussian random variable and => denote the usual weak convergence in the space of 
probability distributions. Define also the plug-in //-sample estimates of p and q as, respectively, p = (pi)f \, where 
pi = YH=i Iftk = i)/ n an d </ = (<?/)"lj, where qi = ££ =1 /(La = /')///. Here and elsewhere in the paper /(•) denotes 
the indicator function. As it turns out, two distinct sets of CLTs may be derived depending on whether the variables 
W„ and V„ are degenerate (that is, their respective variances vanish) or not. For the non-degenerate case the 
appropriate CLTs may be established by expanding on the usual projection and Taylor’s expansion arguments (see, 
e.g., Shao| 20031 chapter 1). This is the simpler case to consider and we discuss it first. 


3.1 CLTs for Non-Degenerate Projections 

The first two CLT results for the empirical (plug-in) Renyi entropy and divergence and their corresponding power 
sums are provided in Theorems 1 and 2 below. Their respective hypotheses (/;;) may be viewed as complementing 


Zhang and Zhang 

2012 

Zhang and Grabchak 

2014) 


\ 1 —a 


where the Hill number 

ENC a is defined in Example 1 . The proofs are deferred to the appendix. 

Recall that for any square integrable random variable X, such that EX f 0, we define its coefficient of variation 
as c gy(X) = (VarXfl 2 \EX\~ x . 

Theorem 1 (Renyi Entropy CLT). Let W(, </! be a sequence of random variables defined by ( |2.6[ ) with inf„ C (S"V (wj a1 ) > 
0 and let 

^pf- 1 (riVarWn 01 ^)- 1 / 2 —> 0 form,n— (3.1) 
Then, under the LD condition ( |2.51 ), as m. n —y °o 

(i) TX a (p)/S^ a (p) —> 1 in probability, 

(ii) sfTx(y a (p) - ^ a (p))/(VarW,\ a) ) 1 / 2 => N( 0,1), 

(in) y/n(\/a-\)(J^a(p)-^a(p))/hfy(w!i a) ) =► 2V(0,1). 

Remark 3.1. Note that the first two assertions of the theorem may be equivalently stated in terms of the convergence 
of the Tsallis plug-in entropy defined by i 


Remark 3.2. Note that the condition ( |3. 1 [ ) is typically stronger than d2.5| ). Indeed, taking a > 1/2 and the power 
law model from Example 2 with 0 < /3 < 1 we obtain £ pf ~ (1 — f) a m 1 ^ 01 / (1 — a/3) and Lpf 01 ” 1 ~ (1 — 


/3)"“ 1 m 1 ‘“/(I —2a/3 + /3). Consequently, for some constant C > 1 


CIP ; C 


a-l 


> 


m (max, //,■ 


ia-1 


m 


l-a 


m 

\fn 


'niLp-^-iLp?) 2 ) V? 

for large m,n and ( |3.1| ) implies ( |2.5| ) with t = 1/2. Similarly, (possibly for different C > 1) 


IK 


a-l 


< 


Cm (min/ pi ) 


a—l 


m 


l-a — 


<C- 


m 


l n(Zp 2 a - l -(Z P f) 2 ) 

and therefore in this case (|3.1|) is seen to be actually equivalent to (2.3 i with T = 1/2. 
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Remark 3.3 (Plug-in Bias). Note that, in view of Jensen’s inequality applied to the strictly concave function x —> x a 
for x > 0 and 0 < a < 1, we have Ey a (p)f'/'aip) < 1. This and the assertion (;) above imply together that under 
the assumptions of Theorem 1 the relative bias of y a (p) satisfies E//' a (p) / y a (p) — 1 —> 0 as n,m -X °°. The 
standard inequality log* < * — 1 valid for * > 0 implies then that the bias of the plug-in entropy estimate satisfies 

EJff a {p) — ^a(p)—> 0 as n,m—> (3.2) 


Unfortunately, as may be seen from the proof of Theorem 1 in the appendix, a more careful analysis of the tail 
events for the plug-in estimate than the one currently performed is needed in order to actually establish a convergence 
rate in < |3.2| ). 

Turning now to our second result, note that the relation ( |2.4[ ) suggests that CLT of Theorem 1 could be also 
extended to the Renyi divergence. The proof is again based on the Taylor expansion method where now the projection 
variable ( |2.6[ ) is replaced by ( |2.7[ ). 

Theorem 2 (Renyi Divergence CLT). Let V„ a ^ be a sequence of random variables defined by ( |2.7[ ) with ini',, Tf'7 7 (vj 0 ^) > 
0 and let 

(j^^i/PiY a + Y,(Pi/di) a ^ (nVarV^)~ l/2 -X- 0 form,n^°°. (3.3) 

Then, under the LD condition (|2.5[), as m,n -X- °° 


(i) <?’a{p,q)l'¥ > a{p,q) -> 1 in probability, 

(ii) yfh{y a {p,q)-y a {p,q))/{VarV, < ; a) y /2 =4/V(0,l), 

(Hi) yfn(a-\)(& a (p,q)-^ a (p,q])t^y^ a) ) => N( 0,1). 


Remark 3.4 (Plug-in Bias). Note that, similarly as in Remark 3.3 we have E.5E a (p,q) j, r f a (p.q) < 1 and, by a 
similar argument as before, Theorem 2(f) implies 


E@ a (p,q)-@ a (p,q) -x 0 asn,m-»°°. 


Example 3.1 (Symmetric Divergence for Power Laws). Consider the symmetric divergence S >\/2 (p,q) with inde¬ 
pendent marginals, which often is the case of interest in NGS applications. Note that in this situation VarVh 1,2 ^ = 
1/2 — (I^) 2 /2. Suppose additionally that p\ = H~ l (p\,m)/(i^ 1 h(i)) and qi = El -1 (fopn)/(fehii)), (f = 
1,... ,m) where the notation is as in Example 2 with 0 < Pi 7^2 < 1. Then 


VarV, 


(i/2) 1 _ \/(l ~/3i)(l -/3 2 ) 

~ 2 2-/3j -p2 


and, consequendy, ( |3.3| ) is seen as equivalent to m/y/n-x 0 (cf. also Remark 3.2 above). 

With some additional effort, the two CLT results of this section may be extended to degenerate projections. This 
is discussed in the next section. 


3.2 CLTs for Degenerate Projections 

In case of a degenerate projection, the linear term of the power sum Taylor’s expansion disappears (cf. formula 


(B.6) in the appendix) and the condition ( |3.1[ ) is no longer needed. However, the LD assumption ( ]2 .5[ ) has to be 
slightly strengthened in order to establish the asymptotic results for the leading (quadratic) term of the appropriate 
expansion. 
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3.2.1 Chi-Square Statistic CLT 


The following lemma describing the chi-square statistic CLT may be of independent interest for models of sparse 
contingency tables. For a recent discussion of a normal approximation to the chi-square statistic in such settings, 
see, e.g., Horgan and Murphy ( 2013| ). Here we apply the chi-square CLT formulated below to obtain weak limits for 
the quadratic terms in the entropy and divergence Taylor’s expansions leading to Theorems 3 and 4 described in the 
next subsection. To begin, consider a pair of distributions (p,q) and a set of positive weights r = (r,-)?Lj and define 
the corresponding chi-square (yf) distance function as 


%?(PA) =«£ 


{Pi ~ m? 


Note that, for instance, the ^-distance statistic between the empirical marginals (p.q) is obtained by setting r, = 

Pi + qi 

(.Pi - Pi? 


^(p.q) = n£- 

and the Pearson ^-statistic is obtained by setting r, = p t 


Pi + qi 


%’p(p,p) = n Y. 


(Pi-Pi? 
Pi 


(3.4) 


Below we denote f+?(u,u) =: SL?. 


Lemma 2. Let (pij)"' j_\ be the bivariate distribution of Z = (X. Y) with X and Y having marginals (/?,■)'■” =1 and 
(qi)"Li where pi = q t > 0. Assume m — > °° as n —> °° and 


mn 


) ‘£ma x( Pi \pi 2 m *)->(), 


(3.5) 


Then as n °o 

^(P-P)-m 


(i) 


\j2m 
and if additionally 


N(0,l) 


sup max 


Pij 


= B < °o 


ij PiPj 


(3.6) 


(ii) 


then also 

3P?p(P,q)-Pn 


where 


V2 Y„ 


N( 0,1), 


l^n 


y 2 

m 


E(< -pii/pi) 

i 

y (Pi — Pii )~ y (.Pij + Pji )~ 

i Pi 1 <k<m A P>Pj 


(3.7) 


Remark 3.5. Note that for the condition ( |3.5| ) simplifies to m/n 0. 

Remark 3.6. Note that under the assumption ( |3.6[ ) we have m — 2B< < m+B 2 and therefore y,y ~ m. In particular, 

if = PiPj then p n = y; = m-\. 

The proof of the result may be found in the appendix. Its application is discussed next. 

















3.2.2 Pure Noise and Equal Marginals CLTs 


The first result covers the case of Renyi entropy when p = u. The proof is outlined in the appendix. Recall that for 
real a and integer k we define (£) = a(a — 1 ) • • • (a — k + \)/k\ 

Theorem 3 (Uniform Entropy CLT). Assume m —>• °° as n —? °o and nr / n = o(n T ) for T > 0. Then 


(i) 


\/2m 


=*N( 0 , 1 ) 


^ n[je a (u)~\ogm-{\-g) 1 log(l+(f) ^)] ^ ^ ^, 

oofm/l ’ ' 

Our second CLT result is the following theorem for Renyi divergence when p = q. The proof is again deferred 
to the appendix. 

Theorem 4 (Degenerate Divergence CLT). Let {pij)" l ‘j_\ be the bivariate distribution ofZ = (X. Y) with X and Y 
having marginals p = and q = where pi = q\ > 0. Let /i„ and be given by ( |3.7[ ). Assume m —> °o 

as n —» oo and that (|3.6|) holds, as well as that 

1 


max 


m 


nmmmpj n min p. 


= o[n 


(3.8) 


Then 


... «(«(«—!)) : [J 7 a (p,q)-\]-Pn 

[l) U 2 m 


■tv( 0 ,l) 


(H) 


n[9 a (p,q)-(a- 1) 1 log(l+o:(a-l)^-)] 
a\f2y n 


■N( 0,1). 


Remark 3.7. Note that for p = q = u the condition (|3.8[) reduces to nr/n = o(n T ) required in Theorem 3. 


3.2.3 Random Sample Size 


When analyzing NGS data some part of the sequences reads is frequently removed for technical reasons, for instance, 
due to poor amplification or reading errors (see next section). In such cases one effectively deals with a molecular 
sample of random size. Our CLT results derived earlier may be extended to this case as well, with the help of 
following simple result described in Theorem 5 below. Its various versions have been discussed, for instance, in the 
context of random allocations (see, e.g.,[Kolchin et all 1978). 


Theorem 5 (Randomized Sample CLT). Let (Z„)“ =1 be a sequence of bivariate variables supported on an m n x m n 
integer lattice with distribution (pij)" l j_\- Let (Z„) = (pij)" h j =l (n = 1,2,3,...,) be the sequence of the empirical 
estimates, each based on an iid sample of (deterministic) size n. Suppose that the statistic If , = ( f n ( Pij ) satisfies 
bnif^n ~ ti n ) N( 0,1) as n —>• °o with some non-random ( a n ,b n ). Let (v „)“ =1 be a sequence of random variables 
independent of(Z n )f =l and following the binomial distributions bin(n , T„) with 0 < inf„ T n < sup j; T„ < 1. Then also 


bv n -flv„) =>N( 0,1). 


Proof Denote by ( -f„ k the random variable V* conditional on the event v^ = n^ and by < 1 > the distribution function 
of the standard normal random variable. By assumption, for any real x we have Pff, k < x) -> <I>(x) provided that 
nk ~> °° as k —> oo. Let e > 0 be sufficiently small and define C e (ko) = {nk : k(Tk~£) <nk < k{Zk + e),k > kf\. Note 
that by the weak law of large numbers P(Vk 6 C e (ko)) —> 1 as C) -» °o. Therefore 

P(% k <x,Vk€ C e (ko)) = £ P{ ^n k < x)p(y k = n k ) 

n k €C E (k 0 ) 

= (®(x) + 8(ko))P(v k eC e (ko)) 


where 8(ko) -> 0 as ko —y °o. Accordingly, as ko -y oo the left-hand side converges to lim/ c Pff Vk < x) and the 
right-hand side to <J»(jc) and the result follows. □ 
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4 Examples and NGS Applications 


We start by providing some numerical examples illustrating that, in general, the CLT results discussed above do 
not hold without assumptions on the relative rate of m and n. Next, we show two examples of applicability of our 
results to analyzing biodiversity of NGS data. The first one is concerned with comparing the diversity of T-cell 
receptor populations in transgenics mice, whereas the second one aims at identifying the hepatocellular carcinoma 
transcription profiles in humans. For the puipose of the T-cell receptors example, we propose a sequential statistical 
procedure of NGS signal filtering based on our CLT results from the previous sections. We begin by pointing out to 
some subtleties in the CLT results discussed in Section 3. 


4.1 Power Law and Pure Noise Models 


Consider the power law model from Example 2 in Section |2T| with /J = I and l(x) = 1. Note that in this case 
(/?min,p,) _1 ~ mlogm /n as well as ~ 0(m(log 2 “m/«) 1//2 ) and therefore the assumptions 

of Theorem 1 are satisfied as soon as 

0 (4.1) 


T— 1 

n m 


for some T > 1/2. Similarly, the assumption (3.51 of Lemma 2 is satisfied as soon as 


, o m 

log ~m - > 0. 

n 


(4.2) 


In Figure [I] we illustrate the convergence results of Theorem 1 (iff) and Lemma 2(f) for this power law model and 
a = 0.5. The panels of Figure [T] presents the sample vs standard normal quantile (QQ) plots for the normalized 


Renyi entropy statistic and the normalized Pearson statistic ( |3.4| ) based on B = 5000 samples from the power law 
distribution, each with m = 1000 and three different values of n = m 1+e (£ = —0.5,0.5,1.5). As seen from the 
plots, in the absence of ( |4.1[ ) the CLT result for the Renyi entropy (cf. Theorem l(iff)) does not hold. Moreover, 
the middle panel QQ plot indicates that for large m , n satisfying n = nr’/ 2 the discrepancy between distribution of 
the entropy function and its plug-in estimate appears in a form of deterministic shift, indicating the presence of 
substantial asymptotic bias and hence the lack of convergence ( |3.2[ ). Similarly, when ( |4.2[ ) is not satisfied than the 
Pearson statistic CLT given in Lemma 2(f) fails with the middle panel again indicating that the bias of the estimate 
does not vanish when m is too large relative to n. 


n=m A 0.5 


n=m A 1.5 


n=m A 2.5 





Figure 1: Projection CLTs. Normal QQ plots for the normalized Renyi entropy (Theorem [ljfff), lower (green) 
curve) and normalized Pearson y 2 statistic (Lemma 2(f), upper (blue) curve) for the power law distribution p, = 1/f. 
The panels shows quantile plots with different values of n = m l+e (£ = —0.5,0.5,1.5) and m = 1000. The solid (red) 
line gives quantiles of the standard normal distribution for reference. 

For comparison, we also considered the uniform distribution (pure noise) model p; = 1/m. Note that it may be 
viewed as a degenerate power law where /3 = 0 and / (x) = 1. Recall that according to Theorem 3 (if) and Femma 2 
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(;), the sufficient conditions for the respective CLTs are m 2 /n T —> 0 and m/n —y 0 (see Remark 3.5 for the latter 
one). The necessity of these conditions is illustrated in the panels of Figure [2] where we again present the (normal) 
QQ plots for the Renyi (a = 0.5) and the Pearson statistics for the same values of B,n and m as in Figure [T] As seen 
from these plots, only in the last panel, when rrr/n ~ 0, we get good CLT approximation for both statistics. These 
results appear consistent with our theoretical results from Theorem 3 and Lemma 2. 


n=m A 0.5 n=m A 1.5 n=m A 2.5 





Figure 2: Degenerate Projection CLTs. Normal QQ plots for the normalized uniform Renyi entropy (Theo¬ 
rem [3|z7), represented by the lower (green) curve) and the normalized Pearson ^-statistic (Lemma 2 (/), repre¬ 
sented by the upper (blue) curve) with pi = m 1 . The panels shows quantile plots with different values of n = m 1+e 
(e = —0.5,0.5,1.5) and m = 1000. The solid (red) line gives the quantiles of the standard normal distribution for 
reference. Note that the normalized Renyi entropy is undefined for the first panel. 


Although not presented here due to space considerations, similar examples based on the bivariate power laws 
may be used to illustrate the necessity of the assumptions of type ( |3.3| ) and ( | 3. 8 [ ) in the CLT results for divergence in 
Theorems 2{iii) and 4 (ii). 


4.2 Applications to NGS Data 

Our CLT results described in Section [3] were originally motivated by questions rising in NGS data analysis. Below 
we describe two examples which adhere to the following basic framework. Denote by 6 i, £2 two independent noise 
distributions each on m support points, and assume that a pair (p. q) of marginal distributions may be represented as 

(p,^r) = A(p,^) + (l-A)(£i,e 2 ) (4.3) 


where (p,q) is a pair of marginal distributions having no common support points with ( 61 , 62 ) and A is the mixing 
proportion (or prior probability of signal). We assume that each 6 is a simple finite mixture of K uniform distributions 
on separate support. Note that the noise-and-signal model from Example |2.3| in Section 2.1 may be viewed as a 
(univariate) special case of (|4.3[) with K = 1. In the first example below we took K = 2. 


Algorithm 1(NGS Diversity Analysis with & a or 


(i) Exponent (a) selection. Use problem-specific criteria (e.g. sample coverage, see Rempala and Seweryn (2013)) 
to identify the appropriate a value. If no prior knowledge exist, the value a = 1/2 (the Bhattacharyya distance) 
may be often used. 


(ii) Noise filtering. Identify the number of mixture components K and the cut-off count(s) k m for the support of 6 / in 
Q with a sequential (starting from the lowest empirical frequency) procedure based on Lemma 2{i) with p = 6 , 
(z = 1,2). The values of A is then estimated as the proportion of a sample falling into the m ’noise’ categories. 

(iii) Equality testing. For a pre-determined value of a, test the hypothesis Hq : p = q by comparing the observed 
value of (alternatively, //fi) with the asymptotic normal distribution in Theorem 4. 
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n 

m 

k/n 

X 

0 

<™l/2 

ENC l/2 
@ 1/2 


Antibiotic (p) 


Control (q) 


39,084 

165 

17 

0.46 

0.869(0.05) 

4.81 (4.79, 4.82) 
122.73 (120.30, 123.97) 


39,084 

165 

17 

0.46 

0.971(0.05) 

4.64 (4.63, 4.67) 
103.54(102.51, 106.70) 


0.155 (0.147, 0.163) 


Table 1: Results of TCR data analysis. The mixture model (4.3 I with heavy-tailed power laws fitted to two sets of 


TCR counts derived from mouse MLN before and after an antibiotic treatment as described in Cebula et al |20l31. 


The empirical Renyi entropy, the Hill number and the Renyi diversity CIs (in parenthesis) are obtained from the CLT 
results of Theorems 1 and 2. 


(iv) Difference quantification. If Hq is not rejected, conclude that 
obtain confidence bounds for Q> a (-9'a). 


= 0 { '/ ] a = !)• Otherwise, apply Theorem 2 to 


4.2.1 T-Cell Receptor Populations 

In this example we apply Algorithm 1 to measure similarity between a pair of T-cell receptor (TCR) populations 
based on the observed NGS counts of receptor-specific nucleotide sequences. With the current NGS technology, 
the two main difficulties in comparing TCR populations are to adjust the under-sampling bias due to unobserved 
rare types and the ‘ghost‘ types created due to the sequencing errors ( Wang et~alj 2014). The first problem may be 
often alleviated by applying diversity criteria, like the Renyi entropy and divergence, which allow for the sample- 
based up-weighting of rare counts (see Rempala and Seweryn {2013 1. The second one requires typically additional 
assumptions, in order to perform analysis as outlined in Algorithm 1 (it). A recent detailed overview of the TCR 
diversity analysis methods was presented by Rempala and Seweryn ( 2013[ ) and earlier on, in a more general context 
of biodiversity, by Hsieh et al ( |20061 and Magurran ( 2005| ). For illustration, we analyze here two populations derived 
from the mesenteric lymph nodes (MLN) of a TCR mini-mouse before and after an antibiotic treatment. The details 


of the experiments and a dataset description are given in Cebula et al (2013). For the current analysis it is important 
to note that, since the experimental groups consisted of different animals, we may consider two experimental groups 
as independent. The total combined sample size (or sequencing depths) was n = 72,030, with initial mo = 6,336 
receptor types. After performing step (ii) of Algorithm 1 m =165 types were identified as “signal” based on the 
cut-off k m = 17 in both populations. The signal population corresponded to the remaining sample size of 38,896 
or about 54% of the original NGS counts. We used S> a with a = 1/2 as the diversity measure in step (iii)-(iv) of 
Algorithm 1. Based on Theorem 2, the asymptotic P -value for testing Hq : p = q was found to be less than 10 4 and 
hence the hypothesis of equal diversity of the two populations was rejected (see Algorithm 1 (Hi)). 

To compare this finding with a more standard parametric analysis, we additionally fitted, with the least squares 
method, the counts of 165 receptor types in two populations to the power law distributions. Since the respective 
exponent values for the two fitted populations were found to be different, with [5\ = .87 (for antibiotic treated mice) 
and j3 2 = .97 (for untreated), the parametric analysis confirmed the findings of Algorithm 1. For illustration, the 
plots of the fitted power law quantiles versus the empirical ones are presented in Figure [3] Additionally, the diversity 
of each of the TCR populations in terms of its respective Renyi entropy / 2 and the Hill number ENC\/ 2 as well as 
the diversity difference measured by the Renyi divergence 3>\/ 2 are listed in Table [TJ along with the corresponding 
asymptotic confidence intervals obtained via Theorems 1 and 2. As seen from the values in Table [T] although the 
diversity of each of the NGS populations was relatively similar in terms of the two populations count patterns, it 
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P, =0.87 


Pa = 0.97 




Figure 3: Power law fit for TCR data. QQ plot of the TCR data against quantiles of a power law distribution with 
fa = 0.87 (SE = .05) and fa = 0-97 (SE = .05) values fitted via the least squares method. 


differed in terms of the specific TCR types expressed. 


4.2.2 Gene Expression Profiling 

Beyond Algorithm 1, the results of Section 3 may be applied to facilitate various other biodiversity analysis, for 
instance, in simultaneous comparison of several pairs of molecular samples. We illustrate this with an NGS data 
example from the recent hepatocellular carcinoma (HCC) study in Chan et al ( [2014 1 which we obtained through the 
gene expression omnibus (GEO) database. The GEO dataset consists of HCC tumor-infected (T) and healthy liver 
(A) tissue samples from three individuals denoted below as follows in relation to their original database designations 
T 1 = ACC4487. T2 = HCC413TJ3 = HCC510T and Al = ACC448A.A2 = HCC413N. N3 = HCC5 10/V. For 
this dataset one of the questions of research interest was whether the expression profiles of genes associated with 
regulation of cell proliferation and programmed cell death differ across T and N samples as well as across individuals 
(cf., e.g., Kong et a H2013| ). To address this specific question, in contrast with the previous TCR example, we were 
thus only interested in a pre-selected subset of the NGS counts. The final values of in = 1332 and n between 1.2 and 
1.9 million reads [|] were obtained after aligning the pre-selected NGS fragments to the HG19 reference genome with 
the Tophat2/Bowtie2 software (Kim et ah 20131 and performing the transcript annotation with the Ensembl genome 
browser (www. ensembl. org). After the final fragments-to-counts conversion, our data analysis was performed in 
three steps. First, the null hypothesis of the tissue homogeneity H[\ n = {7j =N\ = 73 = Ah = T$ = A 3 } was tested 
(and rejected) based on the result of Theorem 4 and the corresponding asymptotic p-value obtained from the ^ 2 (3) 
distribution. Next, the hypothesis of the across-individuals homogeneity was tested by evaluating three pairwise null 
hypothesis Hpj = {£^|/ 2 (7h A/) = 3>i/ 2 (Tj,Nj)]}, 1 <i< j < 3 (each rejected) based on Theorem 4. Finally, having 
rejected the homogeneity hypothesis we have used the result of Theorem 2 to quantify the differences between the 
three sets of T and A tissue samples. The details of the analysis are presented in Table[2j As seen from the numerical 
results, it seems that despite the large individual differences between patients, the set of m = 1332 genes associated 
with cell proliferation and death may be used to distinguish between T-type and N-type samples in HCC patients. 


5 Summary and Conclusions 

We derived two sets of limit theorems for the Renyi entropy and divergence statistics. The first set of results holds 
for lineralizeable statistics (their first order Taylor approximations exist) whereas the second one holds in the de- 

1 Based on these values, the empirical versions of the conditions for the relevant theorems in Section 3 were considered satisfied. 
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Hypothesis 

Statistic 

P-value 

% 2 Value (Cl) 

Trail 

M 1 - 2 

r/2,3 

M o 

Z-/3.1 

M 0 

I>i[0i/ 2 (O -ni\ 2 
01 /2(1)-^1/2(2) 
% 2 (2)-% 2 (3) 
% 2 (3)-% 2 (l) 

<0.001 

<0.01 

NA 

% 2 (1)=0.553 (0.551, 0.555) 
^ 1/2 (2)=0.292 (0.291,0.294) 
0 1/2 (3)= 0.346 (0.345 0.348) 


Table 2: The 95% confidence intervals for the pairwise symmetric Renyi Divergence ^1/2 between the tumor and 
control (healthy) tissues from three individuals based on the profile of expression of pre-selected m = 1332 transcripts 
related to cell proliferation. Here S>\/ 2 (i) denotes &]/i(T l -N l ). 


generate case (when the first order approximations vanish) and requires analyzing the quadratic terms in the Taylor 
expansions. Our Renyi entropy limit theorems complement those obtained elsewhere for the Shannon entropy and 
divergence. 

Based on the CLT results we have proposed here a new framework for analyzing molecular diversity of molecular 
(especially NGS) data based on the idea of analyzing the frequency/contingency tables where cell counts are highly 
unbalanced (for instance, as arriving from mixtures of heavy tailed, power-law type and uniform distributions) and 
the number of cells or, equivalently, the counts distribution support size m, increases with the sample size n. For 
analyzing such tables, we suggested using the empirical Renyi entropy and divergence as the statistical measures of, 
respectively, diversity and pairwise similarity of different molecular sub-populations. 

In the two examples of NGS analysis we have shown how the Renyi entropy methods may be used for filtering 
out low frequency noise and for establishing valid confidence bounds in pairwise divergence analysis for pre-selected 
transcripts. However, it was also seen that in order to apply our CLT results the number of transcripts had to be small 
relative to the sequencing depth. For the special class of heavy-tailed power law distributions, our results in particular 
indicate that the appropriate entropy CLTs are valid (and thus so is our proposed analysis framework) when, roughly 
speaking, m/\fn —»• 0 and not otherwise. As such restriction may be often limiting in very high diversity NGS data, 
other statistics beyond those discussed here and not requiring such condition could be also of interest. We hope to 
pursuing this matter further in our future work. 
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Appendix 


A Proofs for Non-Degenerate Projections (Section 3.1) 

Auxiliary Results 

First, we establish the following simple result on binomial moments. 

Lemma 3 (Binomial moment bound). Let [jc] denote the largest integer smaller or equal to x and let p n be an 
empirical binomial proportion from n independent Bernoulli trials with the success probability 0 < p n < 1. Assume 
np„ —> °o as n —> °° Then for any in teger d > I and sufficiently large n 

|E ( p„n - p„n) d | < C d (np n )^ 


for some universal (nfree) constant C d . 


Proof Let X be a binomial Bin(n,p„ ) random variable and set p = np n . Then (see e.g, Knoblauch (2008)) 
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k =0 
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where {[} denotes a Stirling number of the second kind (i.e. the number of ways to partition a set of i objects into k 
non-empty subsets) and n- = n{n — 1) • • • (n — k+ 1). Let 
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denote the coefficient at p k in the expression for E(X — p) d . Then for 1 < k < d 
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and, using the recursions for the Stilling numbers and the binomial coefficients. 
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Let us argue that for any d > 1 we have 

Cd,k = 0 for k such that d/2 < k < d. 


(A. 1) 


d — i 
k — i— 1 


(A.2) 


1 





The proof of ( |A.2[ ) is by induction with respect to d > 1. Note that the statement is true for d = 1 due to c t j.d = 0 for 
d > 1 (but co,o = 1)- Now, if k > (d + l)/2 then k— 1 > (d — l)/2 and k > d/2 and thus ( |A.1| ) implies Cd+i,k = 0 for 
k > (d + l)/2 since the induction assumption implies = Cd,k = 0. Hence ( |A.2[ ) holds and consequently the 

highest power of fi in the expansion of E(X — fi) d cannot exceed d/2. This yields the assertion of the lemma. 

0 

Lemma 4. Set w/, a} = (wj 0 ^ — EW,, a} ) / (VarW/"’)^ 2 . Under the assumptions of Theorem^ the Lindeb erg condi¬ 
tion 

V e>0 E(W<; a) ) 2 I{\w( a) \>eVn)^0, n^oo (A.3) 

is satisfied. Consequently, 

0 , 1 ) 
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with n iid random variables W ni ' equidistributed with W,) . Moreover, the result remains true if we replace above 


w/ a> by V,[ a> under the assumptions of Theorem [ 2 ] 

ve the statement fo 

( 7,7 = VarWn U \ fi n = EW,/ 1 ’ and W„ = w/, a '. In view of ( |3.1[ ) we have as , 


( (oc ) 

Proof. We shall only prove the statement for Wn , as the proof for Vn is similar. For notational convenience, set 




oclZpf < Lp- 


a-\ 


\fn<5 n y/ho n \fn<5 n 


0 . 


(A.4) 


Note that W n = a<J n 1 pf 1 (5/ — p,) where the vector (5i,..., 8 m ) represents a single trial multinomial random 
vector with parameters (p 1 ,..., p m ). For any £ > 0 

EW 2 I{\W n \ > Ey/n) = a 2 o,; 2 E(£pr\8i- Pi )) 2 I(\Wn\ > Eyfn) 

< « 2 C7- 2 £[(£pf) 2 + £5 ( -pf (a - 1) ]/(|IT„| > Eyfn). (A.5) 

Since by ( |A.4[ ) p n = o((na 2 ) 1//2 ), then by the definition of 5;, for sufficiently large n we have 

: \W n \ > eVn} = {co : a\af l (8i - pi)\ > £y/n} 

= {(0 : 5j = 1 for i such that ct\p^ [ — fi n \ > 

C {co : Sj = 1 for i such, that otpf~ { > - \/no n }} 

=: {(0 : 8j = 1 for i £ /„} 

where the last equality defines the set of indices J„. Note that the size of the set J n satisfies \J n \ —> 0 as n —°o, due to 
maxi<,-< mn pf~ /y/na n —> 0 as n —7 °o, which is implied by ( |3. 1 [ ). This and ( |A.5[ ) give therefore (at least for large n) 

2 2(a-lT _ (tt ^2 V 1 „ 1 ™2^-2 V 1 „2a—1 


EW 2 I{\W„\ > Eyfn) < a„ 2 £ pfil 2 + u 2 p 2[a ’) = {fi„/o n ) 2 £ p t + a 2 a n 2 £ P 2( 

i^zJn i^zJn idzJn 

< 2a (fin/ C7«) 2 £pf / (£y/n<J n ) + {fi/jo/ + 1) £ p 2 ^ 1 / {fi 2 + C7 2 ) 

ieJ„ 


0 


as n —> oo, since sup n (fi n /o n ) 2 < °° by the assumptions of Theorem 1 and a 2 £ p 2a 1 = fi 2 + a 2 . The weak conver¬ 
gence assertion follows now by the Lindeberg central limit theorem (see, e.g, Shao ( 2003[ ) Chapter 1). 

□ 
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Proof of Theorem [T] 

Let us first establish part («'). Note that ( |3.1[ ) implies that 

a 2 = n/VarWh (A. 6 ) 


in view of 

which yields a n > ( nVarW,\ a V^ 2 (Lpf _1 ) _1 - By Taylor’s expansion 

=Y,P?~Y,P? =Y, a P?~ 1 (Pi~Pi) +R « 

where 


(A.7) 


*» = E 


Fixing 5 G (0,1/2), for any £ > 0, we have 
£) — P | | 7?^ | , max 


0 , I - -11 +1 


a-2 


for some random 0 f e ( 0 , 1 ). 


< P > £,max 

=:(/) + (//). 


0 ,■ ( - - 1 
Pi 

di( — - 1 


^ 5 ] +-P ( |/? H | > £. 


max 


0,1 - - 1 


<5 ]+W: 


max 

V «■ 


0 ,(^-l 


Pi 

>8 


>8 


First, note 

(I)< 


-E\R n \l (max 0,(—— 1 
£ V «’ \Pi 


<8 < 


1 / a 


£ V 2 




A'-F/ 




£n ■ 


Now, recall the condition ( |2.5[ ) and consider d > 1 large enough so that dx > 1 and hence (np*)~ d < n~ l for 
sufficiently large n. Applying Bool’s (subadditivity) inequality bound and Lemma[3]to the 2d-th central moments of 
the pi’s, we get 


(II) < P max 


>< a ~ 1)/2d (j-i) > < £ 


QP? < ^-3 y a- 1 
S 2 - 8 2d n^ l i 


where Ci ,C2,C3 are constants independent of n and /;, (with C2 being C2,/ of Lemma[3]). Therefore, for any £ > 0 and 
the numerical sequence a„ = \a n \ = (n/VarW^ a! ) ] - 2 — > °o (cf. ( |A. 6 [ i) as well as a possibly different set of constants 
Ci,C2,C3 


P(|fl„P„| > £) <max ( ' / n - C ^Lp? V ”-^ 0 


(A. 8 ) 


as n —> 00 , due to ( |3. 1 [ >. Note that the random variable a £ pf 1 (pi — pi) has the same distribution as W n = — 

EW^)/n, with hd random variables distributed as ( |2.6| ) and that, due to ( A.8 ), a n R n =R n / (’ VarW n ) 1//2 = o p ( 1). 
Since Lemma [ 4 ] ensures that (W n — EW n ) / (VarW n ) 1 / 2 => N( 0,1), the result follows. 

To argue part (/), note that from the definition of ( |2.6[ ) we have EW,f l] = (X 2 -9' a (p) = a 2 £ pf > a 2 and by ( |A. 6 [ ) 
an = (n^ a (p) 2 /VarW^ a) ) x / 2 —? 00. Consequently, part (i) follows immediately from part (//). 

Finally, we show paid (Hi). Consider arbitrary 8 G (0,1) and note that on the events \^9 a (p)/.9 a (p) — 1| < 5, 
by Taylor’s expansion for \x\ < 1, we have 


log(l +x) =x — 


2(1 + 0.r) 2 ’ 


(A.9) 
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where 0 E ( 0 , 1 ) and hence 


(l-a)(Jf?a(p)-J%x{p)) =log 


y a (P)\ y a {p) 


- 1 - 


^a{P) _ 1 

y*{p) 


a(p)J ^a(p) " 2 ( 1 + 0 ( |^- 1)) 2 


^q(p) ~ ^{P) 
&a(P) 


+ T n . 


(A. 10) 


where the last equation defines T n . By applying again the expansion argument used in the proof of part (ii) with 
y’aiP) / ^a(p) in place of y a (p) and R n /y a (p) in place of R u and the sequence a„ = (ny a (p) 2 /VarW,^) 1 / 2 —)• °°, 
we see that 


a«(^a(p)/^«(p) ~ 1) = VP ya{P) => #(0,1) 


(VarW„ (a) ) 1/2 


(A. 11) 


as n —>• °o. Note that for any e > 0 


p(d„\T n \ > e,| y a (p)/y a (p) -1| <8)<p ! l > e/§ 

and therefore 

P(a n \T n \ >e)< P(\y a (p)/y a (p) - 1 | > 8)+P(d n \y a (p)/y a (p) - 1 | > 2e(l - 8) 2 /8) 

In view of (/) and ( A.l 1[ ), denoting the absolute value of A(0,1) by \N\, ( |A.12| ) yields 

lim sup P(a n \T n \ > e) < 0 + P(|AI| > 2e(l — <5) 2 /<5). 

n 

By taking 8 > 0 to be sufficiently small we get 

lim sup P(d,,|r„| > e) < 7 

n 

for arbitrary 7 > 0, and therefore limsup„ P(a n \T n \ > e) = 0. Thus for any x, we have 


(A. 12) 


P vP 


(1 -a){jfr a (p)-jtr a ( P )) 


< x ] = P \fn 


y v{p )— y i.{p) 

(a) 


(Varwr’) 1 ! 2 


< X +o p {\) 


\ a^y(W,l a) ) 

and the result follows from part (ii). □ 

Proof of Theorem |2] 

The proof follows closely that of Theorem [T] with some obvious modifications. For illustration, we shall only argue 
part (ii). Let us first note that, in parallel with (|A. 6 [), b n = (; n/VarV „) 1 / 2 —> °°. Indeed, since 


bl> 


E(V h 


(«)\2 


> 


2 L ijPu 


+ (p_i' 2a 
Pi) T Ui 


■=r > 


1 —a 


m?,) +m% 


therefore 


2 b n > 


(nVarVn^y/ 2 


1 -a 


(A. 13) 

























due to ( |3.3[ ). Next, we show that the limiting distribution is determined by the projection V,, . By the bivariate 

Taylor expansion, one obtains 

K (y a (p,q) - y a (p,q)) = b n ( a £ (?«’/.P«) 1_ “(A “ Pi) + ( 1 “ °0 L (P*7?“ (?«' “ ?«') ) + V?#. 


where 


d 2 pfq\~ a 


Rn ** ^ d k 3d l a- 

i {{k,l):k,l>0,k+l=2} 1 1 


(pi- Pi) k (qi-qi) 1 


kill 


and 0 , < 1 for all i. Since for the mixed derivatives term, by virtue of the elementary inequality 2 ab < ci + b~ (with 
a = a ( pi — Pi)/pi and b = (1 — a) (qt — q/)/qi) we have 

2a(l - a)Y J p^ l qi a {pi-Pi){qi - qi) < a 2 £p“ -2 9,- - “(p/ - P/) 2 + (l - - ?i) 2 , 

therefore 


< 25„(fi, ( , 1) +/?f) 


with 


and 




a—2 


/v \ \ 1 — 05 / /v \ 2 

*<H +I ^ 




Pi 


qi 


-a—1 / /, \ 2 

qt qt 

qt 


Clearly, it suffices now to show only that b n Rn = o p (l) for i = 1,2. We only prove the second relation, the other 
one follows similarly. Analogously as in the proof of Theorem[t]taking some small 8 > 0 we have 


P{b n R { n ] > e) < P ( bji/ ] > e,max 


>( 2 ) 


0,1 


Pi 


+ 


+ P ^max 

=:(/) + (//). 


Oi[-~ 1 
Pi 


+ 


qi 


0,1 - -1 

qt 


0 f ( — — i 

qi 

>8 


<8 


Apropos (I), for some universal (n-free and 5-free) constant C we have 

2 


(/) < b n e~ x Y<P?q)~ a (1 + S) a (1 - 5)"“~ 1 E ( ) < QT 1 ^pfqi*/(nVarV^f/ 2 -> 0 


qi 


by ( |3.3[ ). Apropos (II), we have 
(II) <P\ max 


—-1 
P; 


> 5/2 j + P (max 


—-1 
?«■ 


>5/2 =: (IIa) + (IIb) 
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Note that for d large enough so that dt > 1, in view of ( |2.5[ ) and the Boole inequality bound combined with the result 
of Lemma [3} 


(I I a) < P max 


max ■ 


Pi \ 2,1 ( qi\ 2d [ ( Pi 


2d 

<[ j) --T 


Pi 


-1 


Pi 


> 8/2 < 


2d 


max ■ 


, a / \ 1 -a 

Pi \ (<H 


+ 


1 -a 


< 


2d 


b n n 


, a / \ 1 —cl 

Pi\ + (qi 


qi 


Pi 


m I \Pi 


(■ npi)~ 


by ( |A. 1 3[ ) and ( f33j ). Similarly, 
(IIb) < P ( max 


max < | — 1 ^ — I ^ 


Pi 


qi 


> 5/2 < - b„n 1 £ 


2d 


, a / \ 1-a' 


qi 


Pi 


and therefore b n R^ = o p ( I) and by a similar argument b n R^ = o p ( 1). Consequently, b n R„ =o p ( 1). Finally, since 


the distribution of a^qi/pi) 1 a (Pi~ Pi) + (1 - a)'Li(Pi/qi) a (qi - qi) is equal to that of ££ = - EV' k ' l> )/ 

where are independent and distributed according to v!, a> given in ( |2.7[ ), the result follows by Lemma|4j 


B Proofs for Degenerate Projections (Section 3.2) 

Auxiliary Results 

The following lemma is cited after Koroljuk and Borovskich (1994[ Theorem 4.7.3, page 162). 


Lemma 5 (Degenerate U -statistic CLT). Let X\... ,X n be a sequence ofiid random elements and let 

-t 


U n (X u ...,X n ) = 


Y h n (X k ,X,) 


1 <k<l<n 


be aU-statistic of order two with a symmetric, real-valued kernel h„ (x, y) which depends on n and satisfies Eh n (X,y) = 
0. Denote also x ¥ n (y,z) = E(h n (X,y)h n (X,z))- Assume that Eli* < °o and set < 7 / = Elf. If the conditions 


are satisfied, then 


n- { of A Eh*^ 0 
ct- 4 ^^o 


nU n /(V2o n )^N(0,l). 


(B.l) 

(B-2) 


□ 


Proof of Lemma |2| 

We start by showing (i). To this end, let X k for k = 1,.. ,,n be iid single trial multinomial variables with parameter 
p and denote I(X k = i) = 8j(X k ). Note the identity 

^p(P, P) ~ m + 1 = Y - Pi) 2 - m + 1 

k 

= n 'Y Pi ' Y 5 (X k ) 8,(Xi)+n 'Y Pi 1 Y 8i ( X k) ~ n - m + 1 

h±l k 

= « _1 Y(Ypi' 5i ( x k) 5 ‘( x i) - l )+ n ~ l Y(YpT l8 ^ x k)- m ) 

k£l k 

= (n-l)f/i 1) +7?i 1) . 
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Here (/„' is the (/-statistic with order-two kernel h n (Xi,Xj) = Y.iPi 1 I{X\ = Xi = /) — 1 which is degenerate, i.e., 
satisfies Eh n (X\,x 2 ) = 0. The remaining term R„ 1 = n 1 J^kiVk ~ E\ 4), where 14 are iid, equidistributed with, say, 
V such that P(V = p~[ x ) = p-,. We will argue that 


(n — \)U^ / y/2(/7i — 1) =f- N(0, 1) 


(B.3) 


? (i 


and R)i’/y/rn = o p (l). The second relation follows easily, since m 1 VarR„ = ( nm)~ l VarV = — 

nr) —^ 0 by assumption ( |3.5[ ). 

The convergence ( B.3 1 will follow from Lemmapland Slutsky’s theorem upon checking the conditions ( |B.1| ) and 
( |B.2[ ). To this end note that in the notation of LemmalNc 2 = Varh n = m— 1 since Var\h n (X] ,W)] = E\h n (X\ .A/)] 2 = 
'LpiE{pJ l I(X 1 = i) - l) 2 = 'LpiE(pT 2 I(Xi = i ) + 1-2 pi x I(X x = /)) = m- 1. Similarly, £7? 4 = 'LpiE{pJ l I{X l = 
i ) — l) 4 = £( pj 2 — 4pr 1 ) + 6m — 3. Therefore 

L(pr 2 - 4 ^ rl )+6m-3 


77- 1 Ehl/o; < 


i(m — l) 2 


<C« 1 L/k 


2 -2 

777 


due to ( |3.5| ) and thus ( |B.1[ ) follows. In order to verify ( |B.2[ ). consider first (x,y) = E[h n (X u x)h n (Xi,y)\ = p x l I(x = 
y) — 1- Since E x ¥; l (X l ,X 2 ) = Y,PiE x i'l(X l ,i) = Y,PiE\pr 1 1(Xi = i) - l] 2 = 777 -1, then we have £'T /2 (Xi,X 2 )/a 4 -)■ 0 
and ( |B.2| ) follows as well. Hence i |B.3[ > follows and yields the assertion (/). 

Now consider part (ii). In parallel to part (/), define (cf. Section[2]) Z k = (X k , K/j for k = I..... 77 as a sequence of 
independent bivariate random variables distributed according to Z = (X,Y). Additionally, for i = 1,... ,777, as before 
let 8i(Xk) = I(Xk = 7), as well as 8i(Y] c ) = I(Yk = 7). Set also A,(Z/t) = A,'(A/., Tr) = 8i(X/ < ) — Note that for 

given / the A,(Z/.j’s for k = 1,... ,77 are independent variables distributed according to 


A i(Z) 


0 with prob. 1 — 2 (/7; — pa ), 
< 1 with prob. /;, — p„ 

— 1 with prob. /;, — p„. 


In particular, £A,-(Z) = 0 and EAj(Z) = 2(p, — /;„■). Recall that /.(„ =L(1— Pu/pi) and consider 

&2p{P,4) ~Hn = « _1 £(2. Pi)~ l (£A ; (Z*)) 2 - p n 

k 

= n- 1 ^(2 P i)- l Ai(Z k ) A i(Z,) +n- l ^(2pi)-\Ar(Z k ) -2( Pi - p H )) 
k^l k 

= (n-l)uP+R { n ) . 


which parallels the representation in part ( 7 ). Regarding R„ note that it is, as before, the zero mean sum of inde¬ 
pendent variables with variance 

VarR { n ] = Var[n~ 1 H(2R7)- 1 A?(Z,)]=77- 1 yar[£(2p i .)- 1 (57(Z)-5 ( (T)) 2 ] 

k i i 

< n~ l Y + C 2 Pj)~ l ) 2 Pij < 2n ~ { Y (( 2 P‘)~ 2 + ( 2 Pj )~ 2 ) Py 

«W ¥J 

<n-'YP7 1 ’ 

where the first inequality above is obtained by applying the second moment bound and noticing that the inner sum 
consists of either two or zero summands, according to X ^ Y or X = Y. The condition ( |3.5[ ) implies that 

R ( n ] /V^ = o P { 1). (B.4) 
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( 2 ) 

Regarding U,, , note that it is a t/-statistic in bivariate variables Z k with second order degenerate kernel given by 
h n (Z h Z 2 ) = L(2p;)~ 1 A,(Zi)A,-(Z 2 ). Hence, in the notation of Lemma 5, a 2 = Eh 2 (Z\,Z 2 ) =E [L(2p i -) _1 A,-(Zi)A ; -(Z 2 )]'' = 
U2p i )- 2 (EAj(Z l )) 2 +'L ¥ j(2p i )-\2p j r 1 (E[A i (Z l )Aj(Z l )}) 2 .Smc eWe h a v e EA}(Z l ) = 2(pi-p ii )mdE[Ai(Z i )Aj(Z l )] 
—E[I(X i = i, Y\ = j) + I(X i = /'. Y\ = /)] = — (pij + pji), it follows that o 2 = y 2 given in ( |3.7[ ). Recall (Remark |3.6| ) 
that under our assumptions 0 ,,/^/m —> 1 and thus ( |B.4| ) implies rY / o n = o p (l). 

Now, in order to complete the proof as in part (;), we only need to show that the conditions ( |B.1| ) and ( |B.2[ ) are 
satisfied for u}~\ since then by Lemmaj^the statement similar to ( | B.3 [ > holds for part (if), namely 


(n-l)uP/(V2o„)^N( 0,1). (B.5) 

To this end note first that by the definition of A variables A,-(Z) = X) p+l (Z) and A?(Z) = A ~ P (Z) for any inte¬ 
ger p > 1. Additionally, since Z is bivariate, for any distinct indices (i. j.k.l) we have £'[A,(Z)A ; (Z)A^(Z)] = 
£'[A,(Z)A y (Z)A / f ; (Z)A/(Z)] =0. Consequently, 

^ = J E[£(2p ! -)- 1 A«(Z 1 )A i -(Z 2 )] 4 =2- 4 {£pr 4 £[A4 (Zl) A^ ( Z 2) ] 

+ 6^pr 2 pJ 2 E(Aj(Z 1 )Aj(Z 2 )A 2 j (Z l )A 2 j (Z 2 )} 

¥i 

+ 4 52 P7^Pj l E [A? (Zi) A? (Z 2 )Aj (Z\ )Aj (Z 2 )] + 0} 

¥i 

= 2” 4 {I>r 44 lP< “ Piif + 6 L Pi 2 Pj 2 C Pij + Pji) 2 + 4 L Pi'P] * (PU + Pji) 2 } 

¥J ¥i 

= 2 ~ 4 {Y,P7 44 (Pi-Pii) 2 + Y J ( 6 PT 2 Pj 2 + 4 PT 3 Pj l )(Pij+Pji) 2 }- 

¥J 


In view of the condition (|3.6|) and Remark |3i6] as well ( |3.5[ ) 

3 


Eh 4 n < 


C 


n<7 4 2n(m — 2B)- 


1 £(pT 2 +mB 2 +B 2 p7') < -^^pr^O 


nm* 


for some universal C > 0 and hence (|B.1|) holds. In order to argue (|B.2[), set z\ = (x\ •)’i) and z 2 = (x 2 ,y 2 ). Then 


(z 1 , Z 2 ) = E [h n (Z,Zl)h n (Z,Z 2 )\ 

= E{[(2p Xl )-\8 Xl (X) - 8 Xl (T)) — (2 p yi r l (8 yi (X) - 8 yi (F))] 
[(2 p X2 r\8 X2 (X) - 8 X2 (Y)) - (2 p y2 r\Sy 2 (X) - 5y 2 (F))]}. 


It follows that 


^n(zi,Z2 ) = (4p Xl p X2 )~ 1 (2p Xl I(xi =X 2 )-p XllX2 -p X2M ) 

+ ( 4 PyiPy 2 )~ l ( 2 PyJ(yi = n) ~ Pyi,y 2 ~ Py 2 ,yi ) 

- { 4 Px\Py 2 )~~ 1 ( 2 PxJ{ x l = yi) - Px u y 2 ~ Py 2 ,x i) 

- ( 4 P yi Px 2 y l ( 2 P yi I(yi = *2) - Py lr x 2 - Px 2 ,yi ) 
=: R(xi,x 2 ) + R(y u y 2 ) -R(xi,y 2 ) -R(yi,x 2 ) 


where the last equality is the definition. Now consider 

£^(z 1; z 2 ) = ER 2 (X u X 2 ) +ER 2 (Y u Y 2 ) +ER 2 (X\,Y 2 ) +er 2 (y 1 ,x 2 ) 

+ 2E[R(X U X 2 )R{Y 1 ,F 2 )] -2 E[R(X u X 2 )R(Xi,Y 2 )] 

- 2E[R(X l ,X 2 )R(Y l ,X 2 )} - 2E[R(Y X , Y 2 )R(X ] , Y 2 )\ 

- 2E[R(Yi, Y 2 )R(Y 1 ,X 2 )] + 2E[R(X 1 , Y 2 )R(Y l ,X 2 )} 
<4(ER 2 (X l ,X 2 )+ER 2 (Y l ,Y 2 )+ER 2 (X l ,Y 2 )+ER 2 (Y u X 2 )). 















where the last inequality follows by applying the inequality 2\ab\ < a 2 + b 2 to the integrants in the cross-product 
terms. To show that the quadratic terms above are of order 0(m) recall the assumption ( |3.6[ ) and note that we have 


er 2 (x u x 2 ) = £ 


Px t Px 2 

16 p 2 

X U X 2 rxi yx 2 


(2p Xl I(xi =x 2 )-p Xl , X2 -Px 2 ,xif 


<L 


Px t Px 2 2 


PxtPx 2 


4 p 2 p 2 X2 ) + L 2 2 

xi,x 2 t J x\t J x 2 X[,x 2 rx\rx 2 

and via a similar argument it is easy to see that this bound applies also to the remaining quadratic terms. Thus 
recalling Remark 13. 6 

^( Zl) Z 2 )/o -„ 4 < 4(m + B 2 )/(m - 2B) 2 -> 0. 

Therefore both ( |B.1[ ) and ( |B.2| ) are satisfied and consequently ( |B.5[ ) holds. In view of ( |B.4[ ). the proof of part (ii) is 
completed. □ 

Proof of Theorem |3] 

Consider first part (;). By Taylor’s expansion (note that the first term vanishes) 

'a 


4 B 2 {p Xl p X2 ) 2 < 


m + B 2 

4 


y a (u) — y'aiu) “=0 + m 1 a n 1 


+ m 


l-a 


a 


Bn 


(B. 6 ) 


where 

R„ = m(ui — m -1 ) 2 (muj — 1) (0, {mu l — 1) + I ) a for some random 0, E (0,1). 

Since by Lemma [ 2 ] and Remark 3.5 the properly normalized variable SC 2 is asymptotically normal under our as¬ 
sumptions, it only suffices to show that R n = nR n /s/m = o p ( I ). 

Fixing 8 E (0,1 /2), for any e > 0, we have 

P(\R n \ > e) = P ^|P„| > £,max|0, (muj — 1)| > S^J + P ^|P„| > £,max|0,- (mui — 1)| < 8 

< P ^max \0j (mui — 1)| > 5^ +P ^|P„| > £,max |0, (mw, — 1)| < 8 

=:(/) + (//). 

Note that Lemma [3] and the Boole inequality imply for d large enough so as dz > 1 

(7) < P f max|(mw; — 1)| > 8 j <C 2 d^(m/ 8 2 n) d <C 2 d 8 ~ 2 d mn~ xd 


0 


by assumption. Regarding (II), note that 8 < 1/2 and on the events {ft): max,|(mw,— 1) | < 5} we have the bound 
\R n \ < 8 ( /‘ Y. m (ui — m ') 2 , for some universal (i.e., n and 8 free) constant C, so that 


(77) <P\C 


—j=^£jym(ui-m l ) 2 -l/n) 


> e/28 ) +P ^Cmax |m«; — 1| \fm > e/2 


Cn 


m 


= P( ^E=\3P 2 -m/n\ >e/2 8 ) + m(2C / e) 2d (m 2 / n) d 


Consequently, from the above considerations and Lemma [2] (denoting as before a standard normal variable by N) it 
follows that 

limsupP(|P„| > £) < limsup (7) + limsup (II) = P(|iV| > e(2C8)~ l ) < 7 
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for any small y > 0 with 8 sufficiently small. Therefore I ini sup,, P(|R„| > e) = 0, which completes the proof of (/). 
Consider now the assertion (/'/'). The argument here is very similar to that of part (Hi) in Theorem [I] and we only 
sketch it out, for the sake of brevity. Denote c n = 1 + (“) It follows from (/) that 

C~ x y a (u)/y a (u) - 1 =Yj C n 1 ( ma i) <X /m- 1 =O p { 1). 

Hence, by virtue of the Taylor expansion ( |A.9[ ) 

-7=== [J#a{u) - login - - ay 1 logc n ] 

a a/ (m/2) 

\fln 

a (l — a)c n y/m 

where T n stands now for the scaled quadratic term in the log expansion ( |A.9[ ). Note that the assertion (ii) fol¬ 
lows as soon as we show that T n = o p ( 1). Similarly as in the proof of (/) above, it follows that on the events 
{| c~ l y a (u)/y a (u) — 1| < 5} for 0 < 8 < 1/2, we have \T n \ < TijL [c^ 1 ^(m«j) a /m — l]“ for sufficiently large n 
and a universal (free of n and 8, as above) constant C. Therefore for any £ > 0 

p(\T n \ >£)= p(\T n \ > £, I c^y a (u)/y a (u) -1| < 8)+P(\T„\ > £,\c~ l y a (u)/y a (u) -1| > 8) 
<p(\T n \> £,\c- 1 y a (u)/y a {u)-\\<8)+p(\c- 1 y a (u)/y a (u)-\\> 8) 

<P (y~^= \ C n l m ~ l| > £ /<^J +°( 1 ) = °( 1 ) 

using part (i) and the fact that c n —? 1 and 8 > 0 may be arbitrarily small. The result follows. □ 




C n 


m 


+ Tn 


Proof of Theorem 01 


We shall only prove part (/) since part (ii) then follows similarly as in Theorems [T] and [2] and part (ii) of Theorem[3] 
Note 


y a (p,q)~ i = — 1} 


^ 2p (p,q)+R n 


where 


2 ~a =l-a 


Rn = 


dpfq 


{(k,l):k,l>0,k+l=3} i d k pid l qi 

1 - a 


{Pi - PifiAi - qi) 1 


I 

{{k,l):k,l>0,k+l=3} 


1 


{Pi ,<?;)=( Pi ,?;)+ft {Pi -Pi ,qi-qi) 

Rn(k,l ) 


kill 


and for all i 0,\ < 1. Due to ( |3.6| ) and Remark 3.6 as well as Lemma [2] (ii) it suffices to show that R n (k. I) = 
nR n (k,l)/y/m = o p (l) for k > 0, / > 0 such that k +1 = 3. Due to the invariance of R n under swapping pf and q\ a , 
it suffices to show the above only for the pairs (k = 3, / = 0) and (k = 2, / = 1). To this end, note 


R n ( 3 , 0 ) = (n/y/m)Y,P? 3 <7; a {Pi~Pi 


n 

m 1 


E w 


qi qi 

qt 


l-a 


1 + 0; 


Pi ~ Pi 
Pi 


a-3 


Pi '(.Pi-Pi) 


2 Pi 


-1 


and |0;| < 1 for all i. For 0 < 8 < 1/2 and £ > 0, let A n (8) = {(Q : max| pi/pi — 1| > 8 or max| qt/qi — 1| >5} 

R(|R„(3,0)| >e)=P({|R„(3,0)| >£}nA„(5))+R({|R„(3,0)| >£}nA^(5)) 

< P(A n (8)) + P({\R n (3,0)\ > £} nA ; c ,(5)). 
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Note that with large n P(A„(8)) < 7 for arbitrarily small 7 > 0 due to 

P(A„(8)) <P{max\pi/pi- 1| > 8) + P(max \qi/qi - 1| > 8) < 7 / 2 + 7/2 

which follows by Lemma [3] and the application of Boole’s bound, as before. Recall from Lemma [2] that p n = 
L(1 ~ Pat Pi)- Then 

P({|R„(3,0) | > £} n A c n (8))<p(^= max | pi/ Pi - 11 £(p,- - Pi) 2 /Pi > 

“ Pi) 1 /Pi ~ P"/ n \ > £ / 25 ) 

+ P ^/=max\pi/pi - 1| > e/2^ =: (Ia) + (Ib). 

Note that ( |3.8[ ) implies in particular ( |3.5[ ) and therefore due to the CLT result in ( 1 ) of Lemma[2]we have (la ) < 7/2 
for arbitrarily small 7 > 0 with n large enough, whereas for (lb) 


(lb) < P (vmmax 


■ | Pi/Pi~ 11 > e) < Cm ( —-— \ 

\nmmpij 


< Cmn~ xd < 7/2 

for sufficiently large d, due to (3.8 1 , the Boole inequality and Lemma [3] (cf. previous proof). Consequently, for 
arbitrarily small 7 and large n 

P(|P„(3,0)|>e)<27. 

For the term R„(2, 1) note 

P„(2,l) = (n/y/m)Y,iP?~ 2 /9?){Pi-Pi) 2 {Qi-Qi) 


= -£=£(l + ft 

-'m ' 


q i 
Vi 


1 + 6i 


Pi~ Pi 


a—2 


Pi \Pi-Pif ( -~ X 


di 


The argument as above then applies also to bounding from above the probability P(\R n (2, 1)| > e) with the obvious 
modification that 

P({\R„(2, 1)| > £}nA£(c>)) < P ^-^=(1 - S)~ 2 max\qi/qi - l\Y,(Pi ~ Pi) 2 / Pi > ^ 


2 n 


<p[ \Y,(Pi-Pi) 2 /pi-p»/ n \ > £ / 2S 

\ \/m w 


+ P ( ma.\\qi/qj — 1| > e/2 ) =: (Ila) + (lib) 


m 


for sufficiently small 8 > 0. One may then show that (Ila) < 7/2 and (lib) < 7/2 and thus P(\R n (2, 1)| > e) < 27 
and part (ii) follows. □ 
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